Cloud Deployment Instructions for DSND Capstone Project

This course taught you the basics of Spark to prepare you for Sparkify, one of the Capstone Project options available in the Data Scientist Nanodegree. Here's the link to the Sparkify Project Overview page.

This deployment is an optional part of the capstone project. Here, you'll find instructions to do this with Amazon Web Services (AWS) or IBM Cloud. If you choose to use AWS, you can use the full 12GB dataset hosted on our public S3 bucket, and expect to use about $30 dollars to run this cluster while you build your project for a week. If you choose to use IBM Cloud, you'll use a medium-sized 23 MB dataset we provide for you to download here. You will still deploy your application on a Spark cluster, but this will not cost you any money.

We will provide instructions for using Amazon EMR (Amazon Elastic MapReduce) and IBM Watson Studio to set up our Spark cluster and notebook. Both are great options for data scientists who want to spin up Spark clusters for projects without going through the complicated steps of setting them up manually, which would require some data engineering. With these tools, you'll be able to quickly create a Jupyter notebook, attach it to a Spark cluster, and open the notebook directly from the console.

Go to the "Create an account" page either for the AWS or the IBM option.

Next Concept